Interlingual annotation of parallel text corpora: a new framework for annotation and evaluation

نویسندگان

  • Bonnie J. Dorr
  • Rebecca J. Passonneau
  • David Farwell
  • Rebecca Green
  • Nizar Habash
  • Stephen Helmreich
  • Eduard H. Hovy
  • Lori S. Levin
  • Keith J. Miller
  • Teruko Mitamura
  • Owen Rambow
  • Advaith Siddharthan
چکیده

This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, non-literal language, and paraphrase (IL2). The resulting annotated, multilingually-induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation

This paper focuses on the next step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to provide parallel corpora annotated with detailed deep ...

متن کامل

Interlingual Annotation Of Multilingual Text Corpora

This paper describes a multi-site project to annotate six sizable bilingual parallel corpora for interlingual content. After presenting the background and objectives of the effort, we will go on to describe the data set that is being annotated, the interlingua representation language used, an interface environment that supports the annotation task and the annotation process itself. We will then...

متن کامل

Semantic Annotation and Lexico-Syntactic Paraphrase

The IAMTC project (Interlingual Annotation of Multilingual Translation Corpora) is developing an interlingual representation framework for annotation of parallel corpora (English paired with Arabic, French, Hindi, Japanese, Korean, and Spanish) with deep-semantic representations. In particular, we are investigating meaning equivalent paraphrases involving conversives and non-literal language us...

متن کامل

Interlingua Development and Testing through Semantic Annotation of Multilingual Text Corpora

This paper describes a multi-site project to annotate the interlingual content of six sizable bilingual parallel corpora. The project addresses several principal problems in parallel: specification of interlingua content and notation, development of reliable annotation methods, and evaluation of annotated corpora. As a by-product, a growing corpus of annotated texts is being produced, which may...

متن کامل

Semantic Annotation of Multilingual Text Corpora

This paper describes a multi-site project to annotate six sizable bilingual parallel corpora for interlingual content. After presenting the background and objectives of the effort, we describe the data set that is being annotated, the interlingua representation language used, an interface environment that supports the annotation task and the annotation process itself. We then present our evalua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2010